-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for PDF output via Pandoc #28
base: master
Are you sure you want to change the base?
Conversation
I tried to run the example but got this:
Regarding exposing the parse tree to the user, I basically agree with the points you brought up. I thought about it previously and came to the conclusion that it would be too much of a maintenance drag. Docstrings are markdown strings so that's part of the interface already. But exposing a particular markdown parser's internals would make it more difficult to move away from 3bmd if need be. Transforms bundled with PAX are, either Lua or Lisp, are fine. As to Lua, Lisp or a combination, it depends. It's good have tightly coupled pieces of code without a good interface in between them be close (e.g. in the same function). The changes in the draft change look good in that regard. If the time comes that the multitude of formats makes this unwieldy, then there will be motivation to come up with a stable interface in between. There is none currently, and maybe with Pandoc additional formats would not require code changes. As to Re docstring indentation, see the comments in the review. Re subheadings, in some places I probably still use explicit '####' headings in function docstrings, although I've trying to move away from that. |
src/document/document.lisp
Outdated
(format stream "---~%") | ||
(format stream "title: ~{~A~}~%" (pt-get (pop parse-tree) :contents)) | ||
(format stream "~A~&---~%" *pandoc-metadata-block*) | ||
(pop parse-tree) ; Skip the link to the title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the final version, check that this skips what it thinks it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some preliminary checking for this, let me know if it's enough.
What version of Pandoc did you use? I imagine it's one of the latest (i.e., post-3.0.0); I've been using 2.19.2 since that's the latest version provided with Guix, so that might be the reason for the error. I guess I'll have to manually update.
Ok, this makes sense.
I was thinking of parts within the code which explicitly check for the
Thanks for the review, I'll address all your comments in my next update to the PR. |
|
I just tried with that version of Pandoc and the manual compiles fine for me. Can you add the (let ((pax::*pandoc-options* '("--verbose")))
(uiop:with-output-file (stream "mgl-pax.pdf" :if-exists :supersede)
(pax:document mgl-pax::@pax-manual :stream stream :format :pandoc-pdf))) |
Here is a shorter one. If I try it on
|
You must have all the required LaTeX packages installed, because otherwise I think the compilation would fail before the error you encounter. Can you try with the |
I can compile the generated tex file latexmk if I replace
with
at multiple locations. |
I've made some progress. The I think the best way to convert links to Also, let me know what you think of the |
|
c633d97
to
fe048e8
Compare
I do not understand how you are encountering these errors. Can you dump the parse tree and the Markdown generated from the parse tree in (uiop:with-output-file (s "/tmp/mgl-pax.lisp-expr" :if-exists :supersede)
(format s "~S~%" parse-tree))
(uiop:with-output-file (s "/tmp/mgl-pax.markdown" :if-exists :supersede)
(print-markdown parse-tree s :format :markdown)) at the beginning of Also, is anything printed to
Great. I have managed to detect local links so that only those are converted to The only major blockers left are getting the manual to compile on your side and correctly escaping TeX characters. I have made a start on escaping them, but the code is weird for now. The rest is just cleaning up the PR. |
Here is the short version:
|
Only documenting (uiop:with-output-file (stream "mgl-pax.pdf" :if-exists :supersede) mgl-pax.lisp-expr:
mgl-pax.markdown:
|
I think the correct way to test for local links is |
I think I know what is causing the problems on your end: you are likely using a newer version of 3bmd which escapes the braces. I have been using 3bmd 4e08d82. Should the braces really be escaped, I wonder? They don't do anything specific in Markdown, and they are useful in passing through LaTeX commands for Pandoc, and also for the raw attribute syntax. It seems nontrivial to deal with this. And in fact, escaping the brackets might cause problems with optional arguments to LaTeX commands. For example, |
I think I know what is causing the problems on your end: you are likely
using a newer version of 3bmd which escapes the braces
<3b/3bmd@18a59d3>.
I have been using 3bmd 4e08d82
<https://github.com/3b/3bmd/tree/4e08d82d7c8fb1b8fc708c87f4d9d13a1ab490cb>.
Should the braces really be escaped, I wonder? They don't do anything
specific in Markdown, and they are useful in passing through LaTeX commands
for Pandoc, and also for the raw attribute syntax. It seems nontrivial to
deal with this.
Yes, I'm using the latest 3bmd (also in quicklisp, I believe).
Braces are defined as special characters in markdown (
https://meta.stackexchange.com/questions/29063/why-is-a-special-character-in-markdown
).
Escaping braces in 3bmd was necessary for mathjax to work.
What about indenting latex stuff by 4 spaces so that it's parsed as a
verbatim block?
|
They are special characters, but they are not defined by the specification as doing anything, and Pandoc relies on them heavily for much of its Markdown-compatible special syntax. Inline LaTeX cannot be indented. Pandoc's raw attribute syntax (i.e., What was the problem with MathJax? Pandoc seems to support MathJax without issue. It is my impression that liberally including LaTeX commands in Pandoc Markdown files is common for those who use it to generate LaTeX. At least I do so. It seems to me that escaping braces from 3bmd makes this impossible. Specially handling the raw attribute syntax would help, but would complicate 3bmd and still not allow the general case of simple LaTeX commands. |
They are special characters, but they are not defined by the specification as doing anything, and Pandoc relies on them heavily for much of its Markdown-compatible special syntax.
Inline LaTeX cannot be indented. Pandoc's raw attribute syntax (i.e., `raw latex`{=latex}) is meant to allow nontrivial LaTeX commands that do not consist of just backslash and braces, for example if some characters would otherwise be escaped, but escaping braces makes it unusable.
What was the problem with MathJax? Pandoc seems to support MathJax without issue. It is my impression that liberally including LaTeX commands in Pandoc Markdown files is common for those who use it to generate LaTeX. At least I do so. It seems to me that escaping braces from 3bmd makes this impossible. Specially handling the raw attribute syntax would help, but would complicate 3bmd and still not allow the general case of simple LaTeX commands.
I couldn't recall what the Mathjax problem was so I tested it with the
latest 3BMD plus this change:
```
--- a/markdown-printer.lisp
+++ b/markdown-printer.lisp
@@ -27,7 +27,7 @@
(defun print-md-escaped (string stream)
(loop for char across string
- do (when (and (not *in-code*) (find char "*_`[]{}"))
+ do (when (and (not *in-code*) (find char "*_`[]"))
(write-char #\\ stream))
(write-char char stream)
(when (char= char #\Newline)
```
I must take back what I said. There seems to be no issue that I can
see with mathjax. This way I can generate documentation for
***@***.***`. Looks quite nice. There are some regions where it
gets confused about verbatim/non-verbatim though.
So I think we can get this small change into 3BMD with the rationale
you provided.
|
I'm likely to merge #29 soonish, which will likely result in lots of conflicts. I'm happy to resolve them if you prefer. |
I can resolve the conflicts, don't worry. I'll try to get back to this PR soon, especially since braces are no longer escaped in 3BMD. |
This is a PR aiming to implement #26.
This is a draft for now, but generating PDF output is already possible as follows:
(I thought I would need to make an octet stream, but I haven't run into any problems yet with the above.)
Though the PR is incomplete, it has brought some issues to light, which I would like to discuss.
First, I can see four ways to implement the PDF support:
Sprinkle
(eq *format* :pandoc-pdf)
tests everywhere so that the intermediate Markdown output need not be modified before passing it to Pandoc.Generate Markdown output in the usual manner and post-process it via
POST-PROCESS-PARSE-TREE
(tentatively renamed fromPOST-PROCESS-FOR-W3M
in this PR) before passing it to Pandoc. This would require some kind of*MARKDOWN-SUBFORMAT*
(or*MARKDOWN-SUPERFORMAT*
, to be more proper), similar to*HTML-SUBFORMAT*
, so that withinCALL-WITH-FORMAT
we can bind*FORMAT*
to:MARKDOWN
, but still know that we want to generate PDF via Pandoc withinDOCUMENT
.Same as 2, but use Pandoc filters rather than modifying the Markdown parse tree from Lisp.
A combination of 1 and either 2 or 3. A combination of 1 and 2 is what is currently done in this PR (there is no
*MARKDOWN-SUPERFORMAT*
yet, so it relies on non–*FORMAT*
-specific behaviors). I haven't introduced my Lua filter yet, because I wanted to see how possible and easy it would be without it.I think it would be easier to keep track of Pandoc PDF–specific requirements in a separate filter (be it Lisp or Lua), and thus I am partial to options 2 and 3; of course, this may just be because I am less familiar with the codebase. And between options 2 and 3, I am partial to option 3, first because the Lua filter could serve as an example to users of MGL-PAX who might want to further adjust the output, and second because I am lazy and the filter already exists.
In fact, another reason to prefer option 3 is that it basically offers users a hook to transform the MGL-PAX output in arbitrary ways, which is impossible (as far as I know) with the other formats. Perhaps MGL-PAX should provide a user hook to transform the parse tree before the output is generated. One possible problem with this is that if the intermediate Markdown format changes from one version to another, the filters or hooks may become incompatible. In the case of the Lua filter bundled with MGL-PAX (i.e., if we go for option 3), this isn't an issue since it can be updated at the same time as the intermediate Markdown representation.
Which of the four methods would you prefer?
Finally, some peculiarities of the MGL-PAX documentation cause issues in the PDF output of the PAX manual:
The most problematic issue is that all MGL-PAX docstrings are indented by two spaces, except the first line, which causes Pandoc to misinterpret things like code blocks and I think some lists (e.g., the lists with ASDF system information). It's probably possible to preprocess the docstrings to deal with this, but since this is an issue specific to MGL-PAX docstrings, I don't think it should be handled by MGL-PAX itself. (The indented lines are also visible when browsing the docstrings from within Lisp.)
The subheadings sometimes jump from
##
to#####
. On second thought, this is not such a big issue. It seemed weird to me to skip a few heading levels, but the PDF output looks fine. Note that the[in package ...]
6th level headings I've changed to an\hbox{}
containing the same text but in sans serif rather than bold in my Lua filter, because I found it looked nicer. This PR doesn't contain this change yet, but if you don't agree with changing it to sans serif upon seeing it, perhaps it should be configurable in some variable.Let me know what you think.